1 research outputs found
Optimization Techniques for a Physical Model of Human Vocalisation
We present a non-supervised approach to optimize and evaluate the synthesis
of non-speech audio effects from a speech production model. We use the Pink
Trombone synthesizer as a case study of a simplified production model of the
vocal tract to target non-speech human audio signals --yawnings. We selected
and optimized the control parameters of the synthesizer to minimize the
difference between real and generated audio. We validated the most common
optimization techniques reported in the literature and a specifically designed
neural network. We evaluated several popular quality metrics as error
functions. These include both objective quality metrics and
subjective-equivalent metrics. We compared the results in terms of total error
and computational demand. Results show that genetic and swarm optimizers
outperform least squares algorithms at the cost of executing slower and that
specific combinations of optimizers and audio representations offer
significantly different results. The proposed methodology could be used in
benchmarking other physical models and audio types.Comment: Accepted to DAFx 202